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DESIGNING DEGENERATE PCR PRIMERS 

Field of the invention 

The invention relates to a method of designing a panel of primers for 
detecting viruses in a high throughput polymerase chain reaction assay. 

Background of the invention 
5 All organisms appear to be capable of infection by viruses, including bacteria, 

animals and plants. Viruses require the use of the cellular translation and 
transcription machinery to replicate. In the process of replication they often have 
deleterious effects on the host cell and thus on the host organism. Viruses constitute 
an important class of pathogens causing many diseases, leading to loss of life in 
10 humans and economic loss in the agricultural industries. 

Summary of the invention 

The polymerase chain reaction (PCR) allows the amplification of a specific 
region of a polynucleotide. The specificity of the reaction is due to the primers 

1 5 which during the course of PCR bind to the region to be amplified in a sequence 
specific manner. The invention provides a method of designing primers which can 
be used in high throughput screening to detect viruses. The method may be used to 
detect unknown viruses which have not yet been sequenced. 

In particular the invention provides a method of designing a panel of 

20 degenerate primer pairs for screening for new members of multiple known virus 
families in a biological sample, wherein each primer pair in the panel binds a 
sequence that is conserved across members of a said virus family and selectively 
directs amplification of sequence of said family by PCR, which method comprises 

(a) providing a plurality of amino acid sequences from members of a first 
25 virus family, 

(b) comparing the sequences to identify conserved regions, 

(c) designing a first primer pair using a computer based method, wherein each 
primer in the pair binds a nucleotide sequence that encodes a conserved region 
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identified in (b) and wherein the primer pair is designed to amplify by PCR the 
nucleotide sequence between the nucleotide sequences that encode conserved regions 
in members of the first virus family, and 

(d) repeating steps (a) to (c) for each virus family. 
5 The invention also provides a method of designing a panel of degenerate 

primer pairs for screening for new members of multiple known virus families in a 
biological sample, wherein each primer pair in the panel binds a sequence that is 
conserved across members of a said virus family and selectively directs amplification 
of sequence of said family by PCR, which method comprises 
10 (a) providing a plurality of nucleotide sequences from members of a first 

virus family, 

(b) comparing the sequences to identify conserved regions, 

(c) designing a first primer pair using a computer based method, wherein each 
primer in the pair binds a conserved region identified in (b) and wherein the primer 

15 pair is designed to amplify by PCR the nucleotide sequence between the conserved 
regions in members of the first virus family, and 

(d) repeating steps (a) to (c) for each virus family. 

The invention additionally provides a panel of primers which has been 
designed by the method of the invention. 
20 . 

Detailed description of the invention 

The invention provides a method of designing a panel of primer pairs which 

can be used in high throughput virus screening. The method comprises initial steps 

which deduce the sequences of the primers using computer based calculations, and 
25 optional later steps in which the primers are synthesised and tested empirically, for 

example to determine optimal PCR conditions and/or to select primer pairs with 

desired further properties. 

The panel of primers provided by the method are designed to be capable of 

detecting unknown viruses based on nucleotide and/or amino acid sequences in the 
30 unknown virus which are similar (homologous) to nucleotide and/or amino acid 

sequences in a known virus. These conserved sequences typically have a role in 
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providing a necessary or advantageous activity or property to the virus. Conserved 
nucleotide sequences may be coding or non-coding sequences. 

In one embodiment the conserved sequences code for or are from virus 
proteins which have the following activities: DNA or RNA polymerase (replicase), 
5 topoisomerase (helicase/gyrase), endonuclease (integrase), nucleic acid binding 
protein, protease, transcription factors, envelope glycoproteins, structural protein 
(e.g. capsid or nucleocapsid protein). 

The panel of primers is designed to detect viruses which are single stranded 
or double stranded DNA or single stranded or double stranded RNA viruses. The 

10 viruses are generally capable of infecting prokaryotic or eukaryotic cells, such as 
bacterial, animal, plant, yeast or fungal cells. Preferably the viruses are mammalian 
(preferably primate) or avian viruses, such as human, pig, horse, sheep, goat, cow, 
chicken, turkey or duck viruses. 

The viruses are typically from any combination of the following families: 

15 Adenoviridae, Arenaviridae, Arteriviridae, Astroviridae, Birnaviridae, Bunyaviridae, 
Caliciviridae, Circoviridae, Coronaviridae, Deltavirus, Filoviridae, Flaviviridae, 
Hepadnaviridae, Herpesviridae, Orthomyxoviridae, Papovaviridae, Paramyxoviridae, 
Parvoviridae, Picornaviridae, Polydnaviridae, Poxviridae, Reoviridae, Retroviridae, 
Rhabdoviridae, Togaviridae or Bornavirus. 

20 The primers of the panel are capable of detecting unknown viruses in a 

biological sample. Such a sample either originates from a single individual or is a 
pooled sample from individuals of the same species. Thus the panel of primers 
detects viruses which infect the same species (from which the sample originates). 

Generally in the method at least 15, 30, 50, 100, 200 or more, typically up to 

25 a maximum of 300 different primer pairs are designed. The primer pairs designed in 
the method bind sequence which is conserved across members of a virus family. The 
panel which is designed in the method may comprise primer pairs that bind sequence 
which is conserved across substantially all members of the family or across a subset 
of the members of the family, for example across all members of a subfamily or of a 

30 genus. Generally, the primer pairs bind at least 70%, at least 80%, or at least 90% of 
the known viruses of the family, subfamily or genus. Preferably less than 10, such as 
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less than 5, primer pairs will be used for the detection of any given family, subfamily 
or genus in the panel. 

The panel of primer pairs is generally capable of detecting viruses from at 
least 10, 15, 20, 30 or more families, typically up to a maximum of 35 families. 
5 The panel of primer pairs may comprise sets of primer pairs which perform a 

nested PGR reaction. Generally such a set of primer pairs comprises a first and 
second primer pair. The first primer pair is able to amplify a template nucleotide 
sequence from a virus to form a PCR product. The second primer pair is able to 
amplify a nucleotide sequence using the PCR product generated by the first primer 
10 pair as a template. The use of nested sets of primer pairs allows increased sensitivity. 

In a preferred embodiment each primer pair is specific for a particular virus 
family, so that it does not detect viruses of other families. 

In the method of the invention the plurality of amino acid sequences or 
nucleotide sequences are provided from different known viruses of the same family. 
15 The amino acid sequences or nucleotide sequences will be for the same protein of the 
different viruses. Typically at least 5, 10, 20, 50, 100 or more sequences are 
provided. The maximum number of sequences provided will, for example, be 300 
sequences. 

Each of the sequences which is provided is typically at least 20, 50, 100, 200 
20 or more amino acids or nucleotides in length. In general the maximum length of the 
nucleotide sequences is 1000 nucleotides and the maximum length of the amino acid 
sequences is 300 amino acids. The sequences may be obtained from a database of 
sequences, such as GenBank. The sequences may be obtained from a database 
comprising virus sequences which are organised into homologous protein families 
25 (based on sequence similarity relationships). 

In a preferred embodiment the sequences are obtained from the VTDA 
database (described in Alba et al (2001) Nucleic Acids Research 29, 133-136) or the 
Virus Division of GenBank. The sequences may be provided in the form of a 
database, preferably in computer-readable form. The sequences are preferably 
30 provided in the form of a computer-readable database constructed using programs 
which identify homologous protein families, such as GeneTableMaker, MKDOM or 
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PSCBuilder. 

The sequences which have been provided are compared to identify conserved 
regions. Typically such conserved regions will have a length of at least 12 
nucleotides, such as at least 15, 21, 27, 36, 99 or more nucleotides (generally up to a 
5 maximum length of 200 nucleotides) or at least 4, 5, 7, 1 0, 25 or more amino acids 
(generally up to a maximum length of 50 amino acids). 

Across the conserved region the virus sequences which are being provided 
will of course share identity or similarity. Typically the amino acids or nucleotides 
in at least 50% of the positions in the region will be the same in at least 50 %, 60%, 

10 70%, or 80% of the viruses of the group (i.e. in the family, genus or subfamily). 

The algorithm which identifies conserved regions generally uses a multiple 
sequence alignment method. The method may comprise (a) aligning all pairs of 
sequences separately to calculate a distance matrix giving the divergence of each pair 
of sequences, (b) calculating a guide tree from the distance matrix, and (c) aligning 

15 the sequences progressively according to the branching order in the guide tree. 
A preferred algorithm for the aligning the conserved sequences is 
CLUSTALW as described in Thompson et al (1994) Nucleic Acids Research 22, 
4673-80. Other algorithms that can be used for aligning sequences are MultAlin 
(Corpet (1988) Nucleic Acids Research 16, 10881-90) or Jalview (Clamp et al (1998) 

20 http://barton.ebi.co.uk). BLOCKS of conserved regions of amino acids may be 

extracted from the multiple alignments, typically using the program Blocks Multiple 
Alignment Processor. Alternatively the entire process of performing multiple 
alignments and extracting BLOCKS can be performed using BLOCKMAKER 
(HenikofF and Henikoff (1994) Genomics 19, 97-107). 

25 The output from the alignment and BLOCK extraction set (i.e. the 

information describing the identified conserved regions) is then entered into the 
algorithm which designs the primers. Such output is typically in the form of partial 
sequences which correspond to the conserved regions (BLOCKS). These BLOCKS 
are input into a primer design algorithm. In one embodiment such an algorithm is 

30 CODEHOP. 

In the primer design step the conserved regions which are chosen as targets 
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for primers preferably comprise few codohs with degenerate counterparts, i.e. 
preferably the sequence has a low redundancy, such as a redundancy of less than 512 
fold, 256 fold or 128 fold. Each primer binds in accordance with Watson-Crick base 
pairing and thus the binding is sequence specific. Each primer will thus be designed 
5 to be wholly or partially complementary to the sequence to which it binds. 

Each of the primers typically has a length of at least 8 nucleotides, such as at 
least 10, 12, 15, 20, 30, 40 or more nucleotides (up to a maximum of 50 nucleotides 
for example). In one embodiment the primer may comprises at least 2, 4 or 6, up to a 
maximum of 10 for example, inosine bases. Inosine is able to bind to any of the four 
10 nucleotides and therefore use of inosine causes a reduction in effective redundancy. 

Each primer pair will be designed so that the PCR product generally has a 
length of at least 20, such as at least 50, 100, 200, 500, 1000 or more nucleotides 
(and typically up to a maximum of 5xl0 3 nucleotides long). 

Each primer is preferably designed so that it anneals to a single site, i.e. the 
primer will not bind to any other site in the genome of the relevant virus. 

Each primer is preferably designed so that it does not exhibit secondary 
structure, i.e. the nucleotides in the primer will not bind substantially to any other 
nucleotide in the primer apart from those to which it is covalently linked. In addition 
preferably each primer is designed so that it does not bind other primers with the 
same sequence. 

In one embodiment the 3' region, and preferably the 3' terminal nucleotide of 
the primer binds to the target sequence with high affinity, thus preferably this region 
or nucleotide comprises a G or C. 

Generally each primer is designed to have an annealing temperature of from 
30 to 65° C, such as 50 to 60° C or 35 to 45° C. In addition each primer pair may be 
designed to ensure that the two primers do not bind to each other. 

The primers are designed by a computer based algorithm. In one embodiment 
such an algorithm designs primers according to the following rules: 

1) A set of blocks is input, where a block is an aligned array of amino acid 
sequence segments without gaps that represents a highly conserved region of 
homologous proteins. A weight is provided for each sequence segment, which can be 
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increased to favour the contribution of selected sequences in designing the primer. A 
codon usage table is chosen for the target genome. 

2) An amino acid position-specific scoring matrix (PSSM) is computed for 
each block using the odds ratio method. 
5 3) A consensus amino acid residue is selected for each position of the block 

as the highest scoring amino acid in the matrix. 

4) For each position of the block, the most common codon corresponding to 
the amino acid chosen in step 3 is selected utilizing the user-selected codon usage 
table. This selection is used for the default 5 1 consensus clamp in step 8. 

10 5) A DNA PSSM is calculated from the amino acid matrix (step 2) and the 

codon usage table. The DNA matrix has three positions for each position of the 
amino acid matrix. The score for each amino acid is divided among its codons in 
proportion to their relative weights from the codon usage table, and the scores for 
each of the four different nucleotides are combined in each DNA matrix position. 

15 Nucleotide positions are treated independently when the scores are combined. As an 
option, the highest scoring nucleotide residue from each position can replace the 
most common codons from step 4 that are used in the consensus clamp. 

6) The degeneracy is determined at each position of the DNA matrix based on 
the number of bases found there. As an option, a weight threshold can be specified 

20 such that bases that contribute less than a minimum weight are ignored in 
determining degeneracy. 

7) Possible degenerate core regions are identified by scanning the DNA 
matrix in the 3' to 5' direction. A core region must start on an invariant 3' nucleotide 
position, have length of 1 1 or 12 positions ending on a codon boundary, and have a 

25 maximum degeneracy of 128 (this is the default setting of CODEHOP). The 
degeneracy of a region is the product of the number of possible bases in each 
position. 

8) Candidate degenerate core regions are extended by addition of a 5' 
consensus clamp from step 4 or 5. The length of the clamp is controlled by a melting 

30 point temperature calculation (the CODEHOP default is 60 °C) and is usually about 
20 nucleotides. 
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9) Steps 7 and 8 are repeated on tKe reverse complement of the DNA matrix 
from step 5 for primers corresponding to the opposite DNA strand. 

In one embodiment CODEHOP (Rose et al (1998) Nucleic Acids Research 
26, 1628-1635) is used to design the primer pairs. This program uses the above 
5 rules. 

The primers designed by the algorithm may then be mapped back to the 
original sequence to choose primer pairs which provide the desired length of PCR 
product. 

The above-described computer based method is repeated until the desired 

10 number of primer pairs have been designed. Optionally the primer pairs can then be 
synthesised and tested. They are typically tested to determine the optimal conditions 
for using the primers in a PCR reaction. 

The PCR reaction is carried out in a PCR mixture that generally comprises 
the following: the template polynucleotide (which will be amplified in the event of 

15 virus detection), one or more primer pairs designed as described above, a polymerase 
enzyme (typically a DNA polymerase, such as Taq polymerase), deoxynucleotide 
triphosphates (dATP, dTTP, dCTP and dGTP) and a suitable buffer. 

The PCR reaction generally comprises cycles of the following steps: a 
denaturation step, a primer annealing step and a polynucleotide synthesis step. 

20 Typically the PCR reaction comprises at least 25 cycles, such as 30, 35, 40 or more 
cycles, up to a maximum of 60 cycles for example. Generally in the denaturation 
step the PCR mixture is heated to a temperature at which the polynucleotides in the 
PCR mixture (in particular the polynucleotide region to be amplified) denature to 
single stranded form. The denaturing temperature is generally from 85 to 98 °C. 

25 In the primer annealing step the primers bind to template nucleotide sequence 

in a sequence specific manner. This step is generally carried out at a temperature of 
from 30 to 65 °C. In the polynucleotide synthesis step the polymerase 
replicates/synthesises nucleotide sequence based on template sequence by addition of 
nucleotides to the 3' end of the bound primers. This step is generally carried out at 

30 about 72^C. 

In one embodiment the primers are tested for their ability to amplify one or 
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more of the plurality of nucleotide sequences from known viruses which were used to 
design the primers, or in the case of amino acid sequences from known viruses being 
used to design the primers the primers may be tested for their ability to amplify the 
nucleotide sequence from the virus which encodes the amino acid sequence. 
5 The primers may be tested in a range of buffer conditions to determine 

optimal buffer conditions for PGR using the primers. The buffer conditions which 
may be tested include pH (typically between 7 and 10), magnesium concentration 
(typically from 0.5 miM to 5 mM), potassium chloride (typically from 0 to 100 mM), 
ammonium chloride (typically 0 to 100 mM), glycerol (typically 0 to 20%), 

10 dimethysulphoxide (typically 0 to 20%), ethanol (typically 0 to 20%), sorbitol 
(typically 0 to 20%) or betaine (typically 1M betaine). 

The primers may be tested at a range of different temperatures to determine 
the optimal temperatures in the PGR reaction. Preferably the primers are tested in 
PGR reaction in which a range of primer annealing temperatures are tested. 

1 5 Typically the range of temperatures is from 30 to 65 ° C . 

The panel of primer pairs or a group of primers within the panel may be 
designed to be used together on the same plate (i.e. using the same thermal cycles). 
Thus such primer pairs will be designed to work at the same annealing temperature. 
In one embodiment a group of primer pairs within the panel are designed to 

20 have similar optimal conditions for use in PGR so that they can be used optimally in 
the same well or reaction vessel, i.e. that they can be used in multiplex PGR. Such a 
. group typically comprises at least 2, 3, 4, 5, 6 or more primer pairs (up to a 
maximum of 8 primer pairs for example). 

To provide such primer pairs the computer based method steps may be used 

25 to design primer pairs which are calculated to have similar annealing temperatures 
and/or the primers are tested to select primer pairs which can be used optimally 
together. Such testing typically determines whether the primers work optimally with 
the same buffers and/or whether the primers have similar annealing temperatures. 
In one embodiment at least one or both primers of each primer pair in the 

30 group carries a label. Typically at least one of the primers in each primer pair will 
carry a different label from that used for the other primer pairs. The PCR product 
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generated by labelled primers carries the labels present on the primers. Thus after the 
group of primers have been used for PCR in the same well detection of the labels in 
the PCR products can be used to deduce which PCR product was formed from each 
primer pair. In one embodiment all forward primers of the group are labelled with 
5 one colour and the reverse primers are labelled with a different colour. 

In a preferred embodiment the primers are labelled with a fluorescent label, 
such as fluorescein based labels (e.g. fluorescein isothiocyanate). Different primer 
pairs may be labelled with fluorescent labels of different colours. The fluorescent 
labels which axe used may be capable of detection by a Beckman CEQ2000™ or 

10 Applied Biosystems A3 700™ fluorescent DNA analyser. The fluorescent labels may 
obtained from Beckman Coulter or Applied Biosystems. 

Another way of being able to determine which PCR products are generated 
by which primer pair is for each primer pair in the group to generate a PCR product 
of different size to the PCR products generated by the other primer pairs of the 

1 5 group. Typically each PCR product which is generated by the group of primers 

differs in size from all the other PCR products by at least 20, such as at least 50, 100, 
200, 500, 1000 or more nucleotides. Each PCR product may for example differ in 
size from all other PCR products by up to a maximum of 3000 nucleotides. 
The following Example illustrates the invention: 

20 Example 

The Example below refers to Figure 1 which shows how primers were 
designed using a database known as C VJDA\ and computer programs know as 
'CLUSTALW, 'BLOCKMAKER' (or 'BLOCKS') and 'CODEHOP'. 

25 Designing a panel of primers 

A panel of primers was designed for detecting unknown viruses from the 
family Herpesviridae according to the strategy shown in Figure 1 . The amino acid 
sequences of herpes virus DNA packaging protein UL15 were obtained from the 
VTDA database (Alba et al, see above). These sequences are shown in Table 1. 

30 The sequences obtained from the VTDA database were then imported into 

CLUSTALW. This compares the protein sequences to identify conserved regions 
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and then aligns the sequences according to the conserved regions. The alignment 
produced by CLUSTALW is shown in Table 2. 

The BLOCKMAKER program was then used to extract blocks of conserved 
aligned sequences which do not contain gaps from CLUSTALW and enter them into 
5 CODEHOP. The primer sequences were then designed by CODEHOP using the 
conserved sequences. The output from the CODEHOP program is shown in Table 3. 
The 'Complement of Block 5 sequences shown in Table 3 shows the sequence of the 
other strand allowing primers to be designed for amplification in the opposite 
direction. 



10 
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Table 1 . All protein sequences of DNA packaging protein UL 1 5 extracted from VEDA. 
Here written as a list and unaligned. 

>gi_10180719 

5 MFGGLLGEETKRHFERmKTKNDRLGASHRNERS IRDGDMVDAPFLNFAI PVPRRHQTVMPAIGILHNCC 
DS LGI YS AI TTRML YSS I ACS EFDELRRDS VPRCYPR I TNAQAFLS PMMMR VANS 1 1 FQE YDEMECAAHR 
NAYYSTMNSF I SMRTSDAFKQLTVFI SRFSKLLI AS FRDVNKIjDDHTVKKRARIDAPS YDKLHGTLEIiFQ 
KMIIiMHATYFVTSVLLGDHAERAEIUjLRVAFBTPHFSD 

MS S FEGI RI GYTS HI RKA I E P VFED I GDRLRRWFGAHRVDHVKGET I TFS F PS GLKST VTFAS S HNTNS I 
1 0 RGQDFNLLFVDEANFIRPDAVQTI I GFLNQATCKI I FVS S TNS GKASTS FL YGLKGS ADDLLNVVTYI CD 
EHMKHVTDYTNATS CS CYVLNKPVF I TMDGAMRRTAEMFLPDS FMQE 1 1 GGGWDRTI CQGDRS IFTAS A 
IDRFLI YRPSTVNNQD PFS QDL Y VYVD PAFTANTKAS GTGVAVI GKYGTD Y I VFGLEHYFLRALTGES S D 
3 1 GYCVAQCL I Q I CAIHRKRFGVI KI AIEGNSNQDS AVAI ATR I AIEMI S YMKAAVAPTPHNVS FYHS ICS 
NGTD VE YPYFIdliQRQKTTAFDFF I AQFNS GRV1AS QDLVS TTVS LTTD PVE YLTKQLTNI S E WTGPTCT 
1 5 RTFS GKKGGNDDT WALTMA VY I S AH I PDMAFAP IRV 
>gi_767318 9 

MFGGLLGEETKRHFERLMKTKNDRLGASHRNERS I RD GDMVD AP FLNF A I PVPRRHQTVMPAIGILHNCC 
DSLGI YSAITTRMLYSS IACSEFDELRRDS VPRCYPR I TNAQAFLS PMMMRVANS 1 1 FQE YDEMECAAHR 
NAYYSTMNSFISMRTSDAFKQLTWISRFSKI^IASFRDVNKLD^ 
20 KM I FDACHLFCNFCFTWRSRRASERIoLRVAFDT 

MSSFEGIRIGYISHIRKAIEPVFED I GDRLRRWFGAHRVDHVKGET I TFS f ps glks tvtfas SHNTNS I 
RGQDFNLLFVDEANFIRPDAVQTI I GFLNQATCKI I FVS STNSGKASTSFL YGLKGS ADDLLNVVTYI CD 
EHMKHVTDYTNATSCSCYVLNKPVFIT^ 

I DRFL I YRP S TVNNQD PFS QDLYVYVD PAFTANTKASGTGVAVI GKYGTD Y I VFGLEHYFLRALTGES S G 
25 S I GYCVAQCL I Q I CAI HRKRFGVI KI AI EGNSNQDS AVAI ATR I AIEM I S YMKAAVAPT PHNVS FYHS KS 
NGTDVEYPYFLLQRQKTTAFDFFIAQFNSGRVIASQDLVSTTVSLT^^ 
RTFS GKKGGNDDT WALTMAVY ISAHI PDMAFAP IRV 
>gi_5689285 

MFGGALGESAKKHFERLLRDRNERLGASRKNECLARGGSLVDAPF^ 
3 0 DGTGI YSAIATRLLYAGI VSSEFGEVRRESLSNGHISKRNREALLAPTLTRVANS ITFHEYDDAQCAAHR 
NAY YSTMNTFGSMRTSD AFQQLAS F IDRFS KLLAAS FKD VNILDRNNAPKRAR ITAPS YDKPHGTLELFQ 
KM IliMHATYFLTSVLLEDHAERAERLLRVI FD I PDFSDAATRHFRQRATVFL VPRRHGKTWFLVPLI ALA 
MSSFEGIRIGYTSHIRKAIEPVF^EIGDRLRRWFGTQCVDHVKGETITFSFPSGS 

RGQDFNLLFVDEANF I RPDAVQT 1 1 GFLNQANCKI I FVS S TNS GKAS TS FLYGLKGS ADDLLNVVT Y I CD 
3 5 EHMKHVTNYTNATS CS CYVLNKPVT? I TMDGAMRRTAEMFLPDS FMKE I IGGITMDRNTCQGDRGVFTAS A 
VERLLLYRPSTVRNQD I LS RDLYVYVD PAFTANTRAS GTGI AVI GRYGADYI I FGLEHFFLRALTGES AD 
AIGECAAQCIAQI CAIHCERFGTIRVAVEGNSNQDSAVAIATRIS IDLAS YVQSGVAPAPHDVCFYHSKP 
AGSNVE YPFFLLQRQKTAAFDFF I ARFNS GRVLAS QDL VSTT I SLSTD P VE YLTKQLTNLSE WTGATGT 
RTFS GKKGG YDDTVVAL VMA VY I S AHASDATFAP IRGVEATCRGPTEA 
40 >gi_1869837 

MFGQQlASDVQQYLERLEKQRQQKVGVDEASAGLTLGGDAIiRVPF^ 

EHS PLFS AVARRLLFNS LVPAQLRGRDFGGDHTAKLE FLAPELVRAVARLRFRE GAPED AVPQRNAYYSV 
LNTFQALHRSEAFRQL VHF VRDFAQLLKTS FRAS SLAETTGP PKKRAKVD VATHGQT YGTIjELFQKM ILM 
HATYFLAAVLLGDHAEQVNTFLRLVFEIPLFSDTAVRH^ 
45 GI KI GYTAHIRKATEPVFDE I DACLRGWFGS SRVDHVKGET I S FS FPDGS RS T I VFAS S HNTNGI RGQDF 
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NLLFVDE ANF I RPDAVQT IMGFLNQANCKI I FVS S TNTGKASTS FLYNLRGAADELLNVVTY I CDDHMPR 
VVTHTNATACS CYILNKP VF I TMD GAVRRTADL FL PD S FMQE 1 1 GGQARETGDDRP VLTKS AGERFLLYR 
PS TTTNS GLMAPEL YVY VD PAFTANTRAS GTGI A WGR YRDDF I 1 FAIiEHFFLRALTGSAPADIARCWH 
Sl^QVLALHPGAFRSVRVAVEGNSSQDSAVAIAT^ 

PFFLLNKQKTPAFE YF I KKFNS GGVMASQELVS VTVRLQTDPVE YLSEQLNNL I ETVS PNTD VRMYS GKR 

NGAADDLWAVIMAIYLAAPTGI PPAFFPITRTS 

>gi_59501 

MFGQQLAS D VQQ YLE RLE KQRQLK VGADE AS AGLTMGGD ALR VP FLD FAT AT PKRHQT W P GVGTLHD CC 
EHSPLFSAVARRLLFNSLVPAQLKGRDFGGDHTAKLEFLAPELVRA^ 

LNTFQALHRSEAFRQL VHFVRDFAQLLKTS FRAS S LTETTGP P KKRAKVD VATHGRT YGTLEL FQKM I LM 
HAT Y FLAAVLLGDHAE Q VNTFLRL VFE I PLFSDAAVRHFRQRATVFLVPRRHGKTWFLVPLI ALSLAS FR 
GI KI GYTAHIRKATE P VFEE I DACLRGWFGS AR VDHVKGET I S FS FPDGS RS T I VFAS S HNTNG IRGQDF 
NLLFVDEANFIRPDAVQTIMGFLNQANCKI I FVS 3 TNTGKAS TS FLYNLRGAADELLNVVTY I CDDHMPR 

PS TTTNS GLMAPDLYVYVD PAFTANTRAS GTGVA WGRYRDD Y 1 1 FALEHFFLRALTGS APAD I ARC WH 
SLTQVLALHPGAFRGVRVAVEGNS S QDS AVAI ATHVHTEMHRLLAS EGADAGS GPELLF YHCE P PGS AVL 
Y P F FLLNKQKT PAFEHF I KKFNS GG VMAS QE I VS AT VRLQTD P VEYLLEQLNNLTE TVS PNTD VRTYS GK 
RNGAS DDLMVAV IMAI YLAAQAGP PHTFAP I TRVS 
>gi_26 05 992 

MFGKALSRETIQYFim&KEVQSRSGAK^ 

CETAQ I FAS VARRLLFRS LS KWRGGES KERLDPS S VE AYVDPKVKQALKT I S FVE YNDAEARS CRNAYYS 
IMNTFDSLRS S DAFHQVANFVARFSRLVDTS FNGADLDGDGQQTS KR I KVD VPT YGKQRGTLELFQKM I L 
MHATYF I AAVI LGDHADR I GAFLKMVFNT PEFS DAT I RHFRQRATVFLVPRRHGKTWFL VPL I ALALATF 
KGI KI GYTAHI RKATEP VFDE I GARLRQWFGNS P VDHVKGENI S FS FPDGS KST I VFAS SHNTNG I RGQD 
FNLLFVDEANF IRPEAVQT 1 1 GFLNQTNCKI I FVS S TNTGKAS TS FLYNLKGAADELIiNVVT Y I CDEHME 
RVKAHTNATS CS C Y I IjNKP VF I TMDGAMRNTAELFLPDS FMQE 1 1 GGGNI SGAHRDE P VFTKTAQDRFLIi 
YR PST VANQD IMSNNL Y VYVD PAFTTNAMAS GTG VA WGR YRSNWI VFGLEHFFLS ALTGS S AEL I ARC V 
AQCLAKVFAI HS RPFDS VR I AVEGNS S QDAAVAI ATNI QLELNTLRQADVVHMPGTVLFYHCTPPGS S VA 
YPFFLLQKQKTGAFDHF IlKAFNSGLVLAS QEL I SNTVRLQTDP VE YLLTQMKNLTE VI TGTS ETRVFTGK 
RNGASDDMLVALVMAVYMASLPPTTNAFSSLSTQ 
>gi_330792 

MFGRVLGRETVQYFEAIiRREVQARRGAKNRA 

CETAQI FAS VARRLLFRS LS KWQS GEARERLDPAS VEAYVD PKVRQALKT I S F VE YS DDEARS CRNAYYS 
IMNTFDALRSSDAFHQVASFVARFSRLVDTSFNGAD 

MHATYF I AAVI LGDHADR I GAFLKMVFNT PEFS DAT I RHFRQRATVFL VPRRHGKTWFLVPL I ALALATF 
KG I K I GYTAHI RKATE P VFDE I GARLRQWFGNS P VDHVKGEN ISFSFPDGSKSTI VFAS S HNTNG I RGQD 
FNLLFVDEANFIRPEAVQTI I GFLNQTNCKI I FVS S TNTGKAS TS FL YNLKGAADDLLNVVTY I CDEHME 
RVKAHTNATACS CY I LNKP VF I TMDGAMRNTAELFLPDS FMQE 1 1 GGGNV S GAHRDE P VFTKTAQDRFLL 
YRPS T VANQD IMS S DLYVY VDPAFTTNAMAS GTGVAWGRYRSNWWFGMEHFFLS ALTGS S AELI ARCV 
AQCLAQVFAIHKRPFDS VRVAVEGNS S QDAAVAI ATNI QLELNTLRRADVVPMPGAVLFYHCT PHGS S VA 
YPFFLLQKQKTGAFDHF I KAFNS GS VLAS QELVSNTVRLQTDPVE YLLTQMKNLTE VVTGTSETR VFTGK 
RNGASDDMLVALVMAVYLS S LP PTSD AFS SLPAQ 
>gi_971317 

MFGGAVGEQSARYFQRLLRERQRRAAERGARPDGGGGARGEDDARVPFI»DFAVAAPKRHQTW^ 
GYCSLAPLFAATASRLLLTSMARAE AGLNTGTGEAH^ VMA 
ALE SMRAS GAFAQVAAFVARFSRL VGTS FSHLGGGDD ADP PRAKRARVE PPS GQTRGALELFQKM ILMPA 
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T YFVAATLLGEHAERI GAFLR VAFNT PDFS DAAVAHFRQRATVFLVPRRHGKTWFLVPL I ALALATFKGI 
KI GYTAHIRKATE PVFEE I VARLRQWFGGERVDHVKGEVI S FS FPDGARST I VFAS S HNTNG I RGQDFNL 
LFVDEANFIRPEAVQT I VGFXiNQASCKI I FVS STNTGKASTS FLY^KGASDGLIaNVVTY I CNEHTPRVA 
AHGGATACS CYVL^P VF ITMD AAARNTAETFLPNS FMQE 1 1 GGGE VARRAE PAA VFTRAAGE Q FLL YR P 
5 S TAAARGPWPERLYMYIDPAFTSNARAS GS GI A WGRHRGS WL VLGLEHFFLPALTGS SAAE IARCAVRC 
FAQVMAVHRRRLDGLFVAVEGNSSQDSAVAIALGVRRELDSL^^ 

FLLQKQKTAAFDHF I RLFNS GR WAS QDLAS LTVRLQTD P VE YLFEQLQNIjTE STAGPGGARAFS GKRRG 
AADDLMVAIiVMAVFVGSLPPTDGAFGPIiAPRPPAD 
>g±_5863S08 
10 MSLIMFGRTLGEESVRYFEFJliKRRRDERFGTLE 

I GTLHNCCE YI PLFSATARRAMFGAFLS STGYNCTPNWLKPWRYSVNANVS PELKKAVSS VQFYEYS PE 
E AAPHRNAYS GVMNTFRAFSLSDS FCQLSTFTQRFS YL VETS FES IEECGS HGKRAKVD VP I YGR YKGTL 
ELFQKMIMHTTHFISSVLLGDH^ 

I ALVMATFRGI KVGYTAHIRKATEPVFEGI KSRLEQWFGANYVDHVKGES ITFSFTDGS YSTAVFASSHN 
1 5 TNGIRGQDFNLLFVDEANF I RPDAVQT I VGFLNQTNCKI I FVS STNTGKASTS FLYNLRGS S DQLLNWT 
YVCDDHMPRVLAHSDVTACS CYVIiNKPVFITMDGAMRRTADLFMADSFVQE I VGGRKQNSGGVGFDRPLF 
TKTARERFILYRPSTVAWCAILSS VliYVYVDPAFTSirrRASGTGVAIVGRYKSDWI I FGLEHFFLRALTG 
TS SSE I GRCVTQCIiGHILAIjHPITTFTNVHVS IEGNSSQDS AVAI SIAI AQQFAVLEKGNVLS S AP VLLFY 
HS IPPGCSVAYPFFLLQKQKTPAVDYFVKRFNSGNI IAS QELVSLTVKLGVDP VEYXjCKQLDNLTEVI KG 
20 GMGNIjDTKTYTGKGTTGTMSDDLMVALIMSVYIGSSCI pdsvfmpik 
>gi_5708110 

mlgkesveivkryrdalrkrtmergpddvdgqemsdsnfi^^ 

qrhqaciapigsfhnccaisrafs ymasei i yenlasystkytdtdaaidtolqvspkrqlftgaaeds il 
palrqklaniinfarfapsdslihdkafdgimn^^ 
25 raklekttseqrdgtlelfqkmii^hatyfassiclgegstersnrylstvfntslfseni iqhfrqrtt 
vflvprrhgkt wflvpl i sllvs s fegir i gytahlrkate pvf i e i ftrlykwfgakqveqvkget itf 
t frngnks a i vf as s qntngl.rgqd fnflfvde anf i kpaalhtvmgflnqtncklffvs s tnt chs nts 

LL YNIjKGKTNSLLNVVTYI CDEHMPEIQKRTD VTTCS CYVLHKPVFVSMDSE VRNTADLFVKDS fmhe I a 
GGRAGKYDSDRTLVPVRALDQFLI YRPSTS S KPNI SGLGKILTVYVDPAFTTNRS ASGTGIALVTALRDS 
3 0 MVLMGAEHFYLDAIiTGEAALE^ 

LRRRLGFS IiTFAHSRQPGTAMAHPFYLIjNKQKS RAFDLF VS LFNS GRFMAS QEL VSNTLVLS KD P CE YLV 
DQ IRNI TVTHGQGFDS FRTFS GKQGRVPDDML VAAVMS TYLALEGS PTAGYHP I AP IGRRQRPA 
>gi_1813970 

MLRGDS AAKI QERYAELQKRKSHPTS CIS TAFTNVATIiCRKRYQMMHPELGIiAHS CNEAFLPLMAFCGRH 
3 5 RDYNS PEE S QREIjLFHERLKS ALDKLTFRPCSEEQRAS YQKLJ3 ALTELYRD PQPQQ I NNFMTDFKKWLDG 
GFSTAVEGDAKAIRLEPFQKNLLIHVIFFIAVTKI PVLANRVLQYLIHAFQIDFLSQTS IDI FKQKATVF 
LVPRRHGKTWFI I P 1 1 S FLLKHM I GI S I GYVAHQKHVS QF VLKEVEFRCRHTFARD YVVENKDNV I S IDH 
RGAKSTALFASCYEPmSIRGQNFHLIj^^ 

RLNNAPFDMLiNVVS YVCEEHLHS FTEKGD ATAC PC YRLHKPTF I SLNS QVRKTANMFM PGAFMDE 1 1 GGT 
40 NKISQITTVLITDQSREEFDILRYSTIiNTNAYDYFGKTLT 

GLEHFFTiRDLS ES SE VAI AECAAHM 1 1 S VLS LHP YLDELR I AVEGJSTTNQAAAVR I ACL I RQS VQS S TL I R 

VLFYHTPDQNHIEQPFYIjMGFJDKAIjAVEQFISRF 

TLAEGTTARYSAKRQNRISDDLIIAVIMATYLCDDIHAIRFRVS 

>g±_2746296 

45 MIiRS CD I DAI QKAYQS 1 1 WKHEQDVKI S STFPNS AIFCQKRF I ILTPELGFTHAYCRHVKPLYLFCDRQR 
HVKS KI AI CDPLNCALSKLKFTAI I EKNTE VQYQKHLELQTS FYRNPMFLQ I EKF I QDFQRWI CGDFENT 
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NKKERIKLEPFQKS ILIHI IFFISVTKLPTLANHVLDYLKYKFDIEFIl^ 

KTWFMI PVI CFLLKNLEGI S IGYVAHQKHVSHFVMKDVEFKCRRFFPQKNITCQDNVITIEHETIKSTAL 
FAS CYNTHS IRGQS FNLL I VDESHF I KKDAFS T ILGFL PQS S TK 1 1 F I S S TNS GNHS TS FLTKL S NS PFE 
I^TVVS YVCEDHVHIIjNDRGNATTCACYRIjHKPKFI S INADVKKTADLFLEGAFKHE IMGGSLC3JVVNDT 
5 LITEQGLIEFDLFRYSTISKQI I PFLGKELYI YIDPAYTINRRASGTGVAAIGTYGDQYI I YGMEHYFLE 
SLLSNSDAS IAECASHMILAVLELHPFFTELKI 1 1 EGNSNQS S AVKI AC I LKQT I S VIRYKHI TFFHTLD 
QSQIAQPFYLLGREKRLAVEYFISNFNSGYIKASQ 
NAKKQTCSDDLLIS I IMAIYMCHEGKQTSFKEI 
>giJ3 25496 

1 0 MLRTCD ITHI KNNYE AI IWKGERDCSTISTKYPNSAIFYKKRFIMLTPELGFAHS YNQQVKPLYTFCEKQ 
RHLKlTOKPLTILPSLSHKLQEMKFLPASDKSFESQYTEFLESFKILra 

ND FGDTRKI QLE PFQKNI L I HVI FF I AVTKL PALANRV liNYLTHVFD I EFVNE S TLNTLKQKTNVFLVPR 
RHGKTWFI VPI ISFLLKNIEGIS IGYVAHQKHV"SHFVMKEVEFKCRRMFPEKTITCIiDNVITIDHQNIKS 
TALFASCYNTHQSIRGQSF]^LIVDESHFIKKDAFSTILGFLPQASTKILFISST^ 

15 S PFEMLS WS YVCEDHAHMLNERGNATACS CYRLHKPKFI S INAEVKKTANLFLEGAFIHE IMGGATCNV 
IND VL ITEQGQTEFEFFRYST INKNL I PFLGKDLYVYLD PAYTGNRRASGTGI AAI GTYLDQY I VYGMEH 
YFLESLMTSSDTAIAECAAHMILS ILDLHPFFTEVKI I IEGNSNQASAVKIACI IKENITANKS IQVTFF 
HTPDQNQIAQPFYLLGKEKKLAVEFFISNFNSGNIKASQELISFTIKIT^ 
YITYSAKKQACSDDLI IAI IMAIYVCSGNSSASFREI 

20 >gi_854 039 

MKLNNS PFEMLS WS YVCEDHAHMLNERGNATACS CYRLHKPKF I S INAE VKKTANLFLEGAF I HE IMGG 
ATCNVINDVL ITEQGQTEFEFFRYST INKNL I PFLGKDLYVYLD PAYTGNRRASGTGI AAI GTYLDQYI V 
YGMEHYFLESMTSSDTAI AECAAHMILS ILDLHPFFTEVKI I IEGNSNQASAVKIACI I KEN I TANKS I 
Q VTFFHTPDQNQ I AQPF YLLGKEKKLAVEFF I SNFNS GNI KAS QEL 1 5 FTI K I TYD PVE YALEQ IRNIHQ 
25 ISVNNYITYSAKKQACSDDLIIAIIMAIYVCSGNSSASFREI 
>gi_5733S64 

MLRTCDITHIKNNYEAIIWKGERNCSTISTKYPNSAIFYKKRFIML^ 
RHLKNRKPLTILPSLTRKLQEMKFLPASDKSFESQYTEFLESFKILYREPLF^ 
NDFGDTRKIQ^PFQKNILIHVIFFIAVTKLPALANRVINYLTHV^ 
3 0 RHGKTWF I VP IIS FLLKNI EGI S I GYVAHQKHVS HFVMKE VEFKCRRMF PEKT I TCLDNVI T I DHQNI KS 
TALFASCYNTHSIRGQSFNLLIVDESHFIKKDAFSTILGFLFQA^ 

PFEMLS WS YVCEDHAHMLNERGNATACS CYRLHKPKFI S INAEVKKTANLFLEGAFIHE IMGGATCNV I 
ND VL I TEQGQTEFEFFRYST INKNL I PFLGKDLYVYLD PAYTGNRRAS GTGI AAI GTYLDQY I VYGMEHY 
FLESLMTS SDTAI AECAAHMILS ILDLHPFFTEVKI I IEGNSNQASAVKIACI I KENITANKS IQVTFFH 
35 TPDQNQIAQPFYLLGKEKKIAVEFFISNFNSGNIKASQELISFTIKITYDPVEYALEQIRNIHQISVNNY 
ITYSAKKQACSDDLI IAI IMAIYVCSGNSSASFREI 
>gi_4996048 

MKLNNS PFEMLS WS YVCEDHAHMLNERGNATACS CYRLHKPKFI S INAE VKKTANLFLEGAF I HE IMGG 
ATCNVINDVLITEQGQTEFEFFRYSTINKNLI PFLGKDLYVYLD PAYTGNRRASGTGI AAI GTYLDQYIV 
40 YGMEHYFLESLMTSSDTAI AECAAHMILS ILDLHPFFTEVKI I IEGNSNQASAVKI ACI IKENITANKS I 
QVTFFHTPDQNQ I AQPFYLLGKEKKLAVEFF I SNFNS GNI KASQEL IS FTI KI TYD PVE YALEQ IRNIHQ 
ISVNNYITYSAKKQACSDDLI IAI IMAIYVCSGNSSASFREI 
>gi_11368 08 

MIJjSRHRERLAANLEETAKDAGERWELSAPTFTRHCPKTAR 
45 T PTS ANPD VGTPRPSEDNVPAKPRLLESLST YLQMRCVREDAHVSTAD QLVE YQAGRKTHDSLHACS VYR 
ELQAFL VNLS S FLNGCYVPGVHWLEPFQQQLVMHTFFFLVS I KAPQKTHQLFGLFKQYFGLFETPNS VLQ 
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TFKQKAS VFLI PRRHGKTW I WAI I SMLLAS VENINI GYVAHQKHVANS VFAE 1 1 KTLCRWF P PKNLNI K 
KENGTI I YTRPGGRSS SLMCATCFNKNS I RGQTFNUuYVDEANF I KKDALPAI LGFMLQKDAKL I F I S S V 
ITS S DRS TSFLLNLRNAQEKMLNVVS YVCADHREDFHLQDAL VS CPCYRLHI PTYI TIDES IKTTTNLFME 
GAFDTELMGEGAAS SNATL YRVVGDAALTQFDMCRVDTTAQE VQKCLGKQLFVY I D PAYTNNTEAS GTGV 
5 GAVVTSTQTPTRSLILGMEHFFLRDLTGAAAYEIASCACimK^ 

IATVLNEICPLPIHFLHYTDKSSALQWPIYMLGGEKSSAFETFIYALNSG^ 
TYLVEQVRAIKCVPLRDGGQSYSAKQKHMSDDLLVAV^^ • • 

>gi_ 1718281 

MLQKDAKIilFISSWSSDRSTSFLLNLRNAQEKMIJWVSYVCADHREDF 
10 IDESIKTTTTOFMEGAFDTELMGEGAASSNA^ 

PAYTNNTEAS GTGVGAWTS TQTPTRSL ILGMEHFFLRDLTGAAAYE I AS CACTM I KAI AVLHPT I ERVN 
AAVEGNS SQDSGVAI ATVLNE I CPLPIHFLHYTDKS S ALQWP I YMLGGEKS S AFETF I YALNSGTLS ASQ 
TWSNTI KISFDP VTYLVEQVRAI KCVPLRDGGQS YS AKQKHMSDDLL VAWMAHFMATDDRHMYKPIS P 
Q 

15 >gi_2246515 

MLQKDAKL I F I S S VNS S DRSTS FLLNIiRNAQEI^LNVVS YVCADHREDFHLQDAL VS C PCYRLH I PTYIT 
IDES I KTTTNLFMEGAFDTEI^GEGAASSNATL YR WGDAALTQFDMCRVDTTAQQVQKCLGKQLFVYID 
PAYTNNTEAS GTGVGAWTSTQTPTRSL I LGMEHFFLRDLTGAAAYE IAS CACTM I KA I AVLHPT IERVN 
AAVEGNS SQDS GVAI ATVLNE I CPLP IHFLHYTDKS S ALQWP I YMLGGEKS S AFETF I YALNSGTLS ASQ 
20 TWSNTIKI SFDPVTYLVEQVRAIKCTPLRDGGQS YSAKQKHMSDDIXVAVVMAHFMATDDRHMYKPISP 
Q 

>gi_2246552 

MLLSRHRERLAAl^QETAKDAGERWELSAPTFTRHCPKTARMAHPFIGV^ 
TPTSANPDVGTPRPSEDNVPAKPRBIjESLSTYLQMRCVREDAHVSTADQLVEYQAARKT 

25 ELQAFLVNLS S FLNGCY VPGVHWLE PFQQQLVMHTFFFLVS I KAPQKTHQLFGLFKQYFGLFET PNS VLQ 
TFKQKAS VFLI PRRHGKTWIWAI I SMLLAS VENINI GYVAHQKHVANS VFAE 1 1 KTLCRWFP PKNLNI K 
KENGTI I YTRPGGRS S SLMCATCFNKNS IRGQTFNLL YVDEANF I KKDALPAI LGFMLQKDAKL I F I SS V 
NS SDRSTS FLLNLRNAQEKMLNVVS YVCADHREDFHLQDAL VS C PCYRLHI PT Y I T IDES I KTTTNLFME 
GAFDTELMGEGAAS SNATL YRVVGDAALTQFDMCRVDTTAQQ VQKCLGKQLFVY I D PAYTNNTEAS GTGV 

3 0 GAWTS TQT PTRS L I LGMEHFFLRDLTGAAAYE IAS CACTM I KAI AVLHPT I ERVNAAVEGNS S QDS GVA 
I ATVLNE I CPLP IHFLHYTDKS S ALQWP I YMLGGEKS S AFETFI YALNS GTLS AS QTWSNT I KI S FD P V 
TYLVEQVRAI KCVPLRDGGQSYSAKQKHMSDDIjLVAVVMAHFMATDDRHMYKP ISPQ 
>gi_4494933 

MLQKDAKLIFISSSNSSDKSTSFLLNLKDAHEKMLNV^ 
3 5 IDETVRSTTNLFLEGAFSTEMGDA^TSAQSMHKIVSDSSLSQLDLCRVKSTSQDIQG 

AYTNNTDAS GTGI GAVI AVNHKVI KC I LLGVEHFFLRDLTGTAAYQ I AS CAAAL I RAI VTLHPQI THVNV 
AVEGNSSQDAGVAI ATVLNE I CS VPLS FLHHVDiCNTLIRS PI YMLGPEKAKAFES FI YALNSGTFS ASQT 
WS HT I KL S FD P VAYL I D Q I KA I RC I PL KD GGHTY C^KQKTM S DD VL VAAVMAHYMATND KF VF KS LE 
>gi_J733 0 018 

40 MLQKDAKL I F I S S SNS SDKS TS FLLNLKDAHEKMLNVVNYVC PDHKDDFNLQDTVVACPCYRLHI PAY I T 
IDETVRSTTNLFLEGAFSTEILMGDAATSAQSMHKIV^ 

AYTNNTDAS GTGI GAVI AVNHKVI KC ILLGVEHFFLRDLTGTAAYQI AS CAAAL I RAI VTLHPQ I THVNV 
AVEGNS SQDAGVAI ATVLNE I CS VPLS FLHHADKNTL IRS PI YMLGPEKAKAFES F I YALNSGTFS ASQT 
VVSHTIKLSFDPVAYLIDQIKAIRCIPLKDGGHTYCAK . 
45 >gi_4019255 

MLLLKAKKAIjMENLTEASSTQSETEWTVDTPTMITNIKKSERMAYSK^ 
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LALKQPLPQTGTLRLLPSEKPYISQKLSNYVK^ 
FI IHLSSFMGCYVKKSTHIEPFQM^ 

KAS IFLI PRRHGKT W I WA 1 1 SML ITS VEHLHVGYVAHQKHVANS VFTE I IJTTLQKWFPSKNIDVKKENG 
TI I YKI PGKKPSTLiMCAS CFNKNS IRGQTFNLLYIDEANFIKKDSLPAILGF>ILQKDAKIjIFISSVNSGD 
5 KATS FLFNLKNAS EKMLN I VNY I CPDHKDDFS LQDSL I S CPCYKLY I PTY IT IDET I KNTTNLiFIiDGAFT 
TEIjMGDISVMSK^IHKVIGETAI^QFDLCRIDTTKPEITQCIxNS IMYLYIDPAYTNNSEASGTGIGAI I 
ALKNNS SKCI I VGIEHYFLKDLTGTATYQ I AS CACS L I RAALVLYPHI QAVHVAVEGNS SQDS AVAI STF 
LNECS P VKVNFMHYKDKTTAMQWP I YMLGSEKSQAFESFIYAINSGT IS AS QS 1 1 SOT 
EQIP^IROTPLRDGSHTYCAKKRTVSDDVLVAW>LAHFFSTSNKHIFKQLNSI 
10 >gi_4019257 

MLQKDAKL I F I S S VTTS GDKATS FLFNLKNAS EKMLNI VNY I CPDHKDDFS LQDS LIS CPCYKLY I PTYI T 
IDETIKNTTNLFLDGAFTTEM^ 

PAYTNNSEAS GTG I GAI I ALKNNS S KG 1 1 VG I EHYFLKDLTGTATYQ I AS CACSL I RAALVL Y PHI QAVH 
VAVEGNS S QDS A VAI STFLNECS P VKVNFMHYKDKTTAMQ WP I YMLGSEKS QAFES FI YAINSGT I S AS Q 
1 5 SIISNTIKLTFDPISYLIEQIRAIRCYPLFJDGSHTYC^K^^ 
I 

>gi_60355 

MliliLKAKKAI IENLSEVSSTQAETDWDMSTPTI ITNTSKSERTAYSKIGVI PSVNLYSSTLTSFCKLYHP 
LTLNQTQPQTGTLRLLPHEKPL ILQDLSNYVKLLTS QNA/CHDTO PTYLELRQ 
20 FVINLSSFLiNGCYVKRSTHIEPFQLQLILHTFYFLISIKSP 

KAS IFLI PRRHGKTWI WAI I SMLLTS VENI HVGY VAHQKHVANS VFTE I INTLQKWFPSRYIDIKKENG 
TI I YKSPDKKPSTIMCATCFNKNS I RGQTFNLLY I DE ANF I KKDSLPAI LGFMLQKDAKL I F I S S VNS GD 
RATSFLFNLKNASEKMLNIVOTICPDHKDDFSLQDSLI^ 

TELMGDMS GI S KSNMHKV I SEMAI TQFDLCRADTTKPE I TQCLNSTMY I YID PAYTNNS E AS GTGI GAI L 
TFKNNS S KC 1 I VGMEHYFLKDLTGTATYQ I AS CACS L I RAS LVLYPHI QCVHVAVEGNS S QDS A VAI STL 
IHECSPIKVYFIHYKDKTTTMQWPIYMLGAEK^ 

EQ IRS IRCYPLRDGS HTYCAKKRTVS DD VLVAVVMAYFFATSNKHI FKPLNST 
>gi_JS95201 

MLQKDAKIIFISSVNSSDQTTSFLYNLKNAKEKMLNW 
IDENIJODTTl^FMEGAFTTEIJyiGDGA 

D PAYTNNTEAS GTGMGA WSMKNS DRC VWGVEHFFLKELTGAS SLQI AS CAAAL I RSLATLHPFVRE AH 
VAI EGNS S QDS AVAI ATLLHERS PLP VKFLHHADKATGVQWPMY I LGAEKARAFETF I YALNSNTLS CGQ 
AI VSNTI KLS FDF VA YL IEQ IRAI KCYPLKDGTVS YCAKHKGGS DDTL VAVVMAHYFATS DRHVFKNHMK 
QI 

>gi_4 928 934 

MLLS S FRNHLQKNYEKYS VQAQNID WP VETP VLI S KDS KTNRLAHPLI GVI SR INLYS PTLKYYCDE YS T 
TKQPKFTPD IGYVRDLKKHDQYFLPKLQHHLSTLCEAYNHVDRQAQVEFNAS ILTLKAFNANGVLNELKQ 
FLIl^SCFLNGCYVSKSTCIELFQKQLIIjHTFYFL 

KSTVFL I PRRHGKTWI WAI I S VLLAS VENVHI GYVAHQKHVANAVFTE I ITTLYQWF P S KNI E I KKENG 
TI I YTKPGRKPSTLMCATCFNKNS I RGQTFNI LYVDEANF I KKEALPAI LGFMLQKDAKI I FI S SVNSAD 
KSTSFLFJSTLRNAKEKMLNVVir^ 

TELMGD I S TFPTS SMFKWEEQALFHFD I CRVDTTQ I DTVKI I DNVLYVYVD PAYTSNS EAS GTGI GAW 
PLKTKVKTI ILGIEHFYLKNLTGTASQQI AYCVTSMI KAILTLHPHINHVNVAVEGNS SQDS AVAI STF I 
NEYCPVPVFFAHCNJSRSSVFQWPIYILGSEKSQAFEKFIC^ 
Q I RAIRCLPLKDGS YT YCAKQKTMSDDTLVAVVMANYMA I SEKHTFKELCKT 
>gi_1632798 
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ML YAS QRGRI/TENLRNALQQD STTQGCLGAET PS IMYTGAKSDRWAHPLVGT I HASNLYCPMLRAYCRHY 
GPRP VFVAS DESIiPMFGAS PALHTPVQVQMCIiliPELRDTLQRLL PPPNIiEDS EAIaTEFKTS VS S ARAI I*E 
DPNFLEKREFVTSIiASFLSGQYKffiCPARI^ 
IJ2KLHIFKQKASVFLIPRRHGKTWIVVAI 
5 R VE VNKETS TIT FRHS GK I S S TVMCAT CFNKNS I RGQT FHLiL F VDE ANF I KKEAL PA I L GFMLi QKD AK 1 1 
FI S S VETS ADQATS FliYKLKDAQERLLNVVS YVCQEHRQDFDMQDSMVS CPCFRLHI PS YITMDSNIRATT 
NLFliDGAFS TELiMGDTS SLS QGS LS RTVRDD AINQLELCRVDTIiNPRVAGRIiAS S LYVYVDP AYTNNTS A 
S GTGI AAVTHDRATJ PHRVI VLGLiEHFFLKDLTGDAALQ I AT CWALVS S I VTLHPHLEE VKVAVEGNS S Q 
DSAVAIAS I IGESCPLPCAFVHTKDKTSSLQWPMYLLT^ 
1 0 SFDPVLYLISQIRAIKPIPIiRDGTYTYTGKQR^ 
>gi_2337991 

MF YVK VMPALQKACEELQNQWS AKS GKWPVPETPLVAVETRRSERWPHP YLGLLPGVAAYS STIiED YCHIj 
YNP Y I DAXiTRCDLGQTHRRVATQP VLSDQLCQQLKKIjFS C PRNTS VKAKLEFT2 AAVRTHQALDNS QVFUE 
LKTFVIiNLSAFIiNKRYSDRSSHIEIiFQKQLIMHT^ 

1 5 F KQ KAS VFIj I PRRHGKTW I WA 1 1 S I LIAS VQDLR I G YVAHQKHVANA VFTE V INTLHT F F P GKYMD VKK 
ENGTI I FGLPNKKPSTLLCATCFNKNS IRGQTFQtiLFVDEANFI KKDALPTILGFMLQKDAKIIFISSSN 
S S DQS TS FliYI^KGASERMLNVVS YVCSNHKEDFSMQDGLI S CPCYSLHVPS YIS IDEQ I KTTTNI/FLDG 
VFDTELMGDSSCGTLSTFQI ISESALSQFELCRIDTAS PQVQAHIjNSTVHMYIDPAFTNNIjDASGTGI SV 
I GRLGAKTKVILGCEHFFLQKLTGTAALQ IAS CATS LLRS WI IHPMI KCAQ I T I EGNS S QDS AVAI ANF 

20 I DE CAP I P VTFYHQSDKTKGVLCPL YLLGQEKAVAFES F I YAMNLGLCKAS Qli I VSHT I KLS FD P VT YLIi 
EQVRAIKCQSLRDGSHTYHAKQKl^ 
>gi_2317977 

MLQKDAKI FF I S S VNS GEKTTS FLYl^KDANEKJXVNVVS YVCS EHMEDFNKQS AI TAG PCYRIiYVPEF I T 
INDNI KCTTNLIjLiEGS FATELMGNMQSHTE VS GNSM I HES SLTRLDFYRCIDTAGQGAPTTENTIjF VYI DP 
25 A YGNNVHAS GTGI VAMS HCKHTKKC I ILGLEHFFLNNLTGTAAHNI AS CATAIiLEGILFQHPWIQE IRCI 
I EGNSNQD S A VA I ATF I S HNT KL PTLFAS YRDKTGMQ WP I YML S GVKTL&FQNF ISS LNQGLL CAS QTW 
SOTVIiLSSDPISYLIEQIKNTKCIYHKNK^ 
>gi__6625593 

MF I AS KKS YFEAVYRSTVS SHSEEFWKSDDPVYFTQYKKQCNRIaPNAYLGTTiHS AS KYSENFRHYVATFS 
30 NS PLDFPQS VFNERNPCEYS VPYLDS ALQCS AKTLVGCS VSTTERNEYE VCKEATRCFKDAMSHKVLKVF 
LSNLS WFLKGHYKSKQAFLEPFQKQLILHSFMFVAS IKCPETTTKLFDEFKFIiIiDML YFDNTDLLTFLQK 
S PAFLI PRRHGKTW I VTA 1 1 SMLLTS VDDLHI GYVAHQKHVSLA VFIiE I SNI IxLAWF PRKNI D I KKENGV 
ILYSHPGKKSSTIjMCATCFNKNSIRGQTFNIjLFV^ 

TTS FLYNIiKD ANE KMVNWS YVCS EHMEDFNKQS AI TACPCYRLYVPEF I T INDNI KCTTNLLLEGS FAT 
3 5 ELMGNMQSHTE VS GNSMI HE S SLTRLDFYRCDTAGQGAPTTENTLF WI DPAYG^ GTGI VAMS HC 

KHTKKCI ILGLEHFFIiNNIiTGTAAHNIAS CATA T iIiKGIIjFQHP WX QS IRCI IEGNSNQDS AVAIATFI SH 
NI KLPTLFAS YRD KTGMQWP I YKLS GDKTLAFQNF ISS LNQGLIaCAS QTVVSNTVLLS SDP I S YL I EQ I K 
NTKC I YHKNKT ITFQS KTHTMSDD VL I AC S YI S FS IK 
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Table 3. Degenerate primers generated by CODEHOP 



Block x72 63xbliD 

TLYVYIDP 
oligo : 5 r -AACCTGTACGTGtayntngaycc-3 T 

TLY VYIDPA 
oligo : 5 ' -AACCTGTACGTGTACntngayccngc-3 ' 



TLYVYIDPAY 
oligo : 5 ' -AACCTGTACGTGTACATngayccngcnt-3 T degen=128 temp=*42 . 5 

Complement of Block x7 2 63xbliD 



degen=64 temp=33.4 Extend clamp 
degen=128 temp=3 6.G Extend clamp 

Extend clamp 



YI-DPAYTNNT- 
atrnanctrggGCGGATGTGGTTGTTGT 
degen=64 tempos 2 - 9 . 

DPAY TNNTRA 
anctrggncgnaTGTGGTTGTTGTGGGTCCG 
degen=128 temp=61.8 

DPAYTNNTRA 
ctrggncgnawGTGGTTGTTGTGGGTCCG 
degen-64 temp=61 . 0 



oligo : 5 1 -TGTTGTTGGTGTAGGCGggrtcnanrta-3 r 



oligo : 5 1 -GCCTGGGTGTTGTTGGTGTangcnggrtcna-3 T 



oligo :5 ' -GCCTGGGTGTTGTTGGTGwangcnggrtc-3 ' 



Block x7263xbliE 

CIIFGMEHFF 
oligo : 5 ' -TGGATCATCTTCGGCATngarcaytwyt-3 1 degen=64 temp=55 . 7 

I F G M E H F • F ' L 
oligo : 5 1 -CATCTTCGGCATGGAGcaytwytwyyt-3 ■ degen=64. temp=62 . 0 

Complement of Block x72 63xbliE ■ ■ 

E H F F * la R* D L T G 
ctygtrawrawGGACTTCCTGGACTGCCC 
degen=32 temp=61 . 7 • ■ 

HFFLRD LTG 
tygtrawrawrrACTTCCTGGACTGCCCG 
dagen-123 temp=60 . 8 



Extend clamp 



' oligo : 5 ' -CCCGTCAGGTCCTTCAGGwarwartgytc-3 '. 



oligo : 5 ' -GCCCGTCAGGTCCTTCArrwarwartgyt-3 1 



HFFLR DL TG 

gtrawrawrraCTTCCTGGACTGCCCG oligo : 5 r -GCCCGTCAGGTCCTTCarrwarwartg-3 1 degen=*64 
temp= 60 .8 

Block x7263xbliF 

E V H I A V- E G N 
oligo : 5 1 -GGACGTGCACGTCGCCrtngarggnaa-3 * degen=64 temp=63 . 8 - 



Complement of Block x72 63xbliF 

E G N.S S Q D S A 
anctyccnttrwGGTTGGTCCTGAGGCGG 
degen«128 temp=62 . 7 



oligo : 5 ? -GGCGGAGTCCTGGTTGG wrt tnc cy t cna-3 1 



EGNSSQDSAV 

ctyccnttrwsGTTGGTCCTGAGGCGGC oligo : 5 T -CGGCGGAGTCCTGGTTGswrttnccy.tc~3 1 
degen-64 temp=63 . 9 
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CLAIMS 

1 . A method of designing a panel of degenerate primer pairs for screening for new 
members of multiple known virus families in a biological sample, wherein, each 
primer pair in the panel binds a sequence that is conserved across members of a said 
5 virus family and selectively directs amplification of sequence of said family by PGR, 
which method comprises 

(a) providing a plurality of amino acid sequences from members of a first virus 
family, 

10 

(b) comparing the sequences to identify conserved regions, 

(c) designing a first primer pair using a computer based method, wherein each primer 
in the pair binds a nucleotide sequence that encodes a conserved region identified in 

1 5 (b) and wherein the primer pair is designed to amplify by PCR the nucleotide 
sequence between the nucleotide, sequences that encode conserved regions in 
members of the first virus family, and , 

(d) repeating steps (a) to (c) for each virus family. 
20 - , 

2. A method of designing a panel of degenerate primer pairs for screening for new 
members of multiple known virus families in a biological sample, wherein each 
primer pair in the panel binds a sequence that is conserved across members of a said 
virus family and selectively directs amplification of sequence of said family by PCR, 
25 which method comprises . - 

(a) providing a plurality of nucleotide sequences from members of a first virus 
family, 

30 (b) comparing the sequences to identify conserved regions, 
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(c) designing a first primer pair using a computer based method, wherein each primer 
in the pair binds a conserved region identified in (b) and wherein the primer pair is 
designed to amplify by PGR the nucleotide sequence between the conserved regions 
in members of the first virus family, and 

5 

(d) repeating steps (a) to (c) for each vims. family. 

3. A method according to claim 1 or 2 which further comprises synthesising one or 
more of the primer pairs and determining optimal conditions for using the primer 

10 pairs in PGR. 

4. A method according to any one of the preceding claims which comprises testing 
the ability of one or more of the primer pairs to amplify a nucleotide sequence that 
encodes an amino acid as defined in claim 1(a) or a nucleotide sequence as defined in 

15 claim 2(a), 

5. A method according to claim 3 or 4 which comprises testing the primer pair(s) in a c 
range of buffer conditions to determine the optimal buffer conditions for PCR. 

20 6. A method according to any one of claims 3 to 5 which comprises testing the 
primer pair(s) at a range of different temperatures' to determine the optimal 
temperature for PCR. 

7. A method according to any one of the preceding claims which comprises 
25 identifying one or more groups of primer pairs wherein the primer pairs in each 
group have similar optimal conditions of use in PCR such that they can be used • 
optimally in the same reaction vessel. 

8; A method according to claim 7 wherein each primer pair in a group generates" a 
30 PCR product of a different size to the other primer pair(s) in the group. 
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- 9. A method according to claim 7 or 8 wherein each primer pair in a group carries a 
different label from the other primer pair(s) in the group. 

1 0. A method according to claim 9 wherein each primer pair in a group carries a 
• 5 differently-coloured flourescent label. 

1 1 . A method according to any one of the preceding claims wherein the biological 
sample is a single-source sample from a single individual or is a pooled sample from 
more , than one individual of the same species. 

10' . 

12. A method according- to claim 1 1 wherein the biological sample is a human 
sample. 

13. A method according to any one of the preceding claims wherein at least 50% of' 
-1 5 the primer pairs bind a sequence that is conserved across all of the genuses and/or . 

subfamilies. 

14. A panel of primers designed according to any one of the preceding claims. 
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